Beyond the Listening Test: An Interactive Approach to TTS Evaluation

نویسندگان

  • Joseph Mendelson
  • Matthew P. Aylett
چکیده

Traditionally, subjective text-to-speech (TTS) evaluation is performed through audio-only listening tests, where participants evaluate unrelated, context-free utterances. The ecological validity of these tests is questionable, as they do not represent real-world end-use scenarios. In this paper, we examine a novel approach to TTS evaluation in an imagined end-use, via a complex interaction with an avatar. 6 different voice conditions were tested: Natural speech, Unit Selection and Parametric Synthesis, in neutral and expressive realizations. Results were compared to a traditional audio-only evaluation baseline. Participants in both studies rated the voices for naturalness and expressivity. The baseline study showed canonical results for naturalness: Natural speech scored highest, followed by Unit Selection, then Parametric synthesis. Expressivity was clearly distinguishable in all conditions. In the avatar interaction study, participants rated naturalness in the same order as the baseline, though with smaller effect size; expressivity was not distinguishable. Further, no significant correlations were found between cognitive or affective responses and any voice conditions. This highlights 2 primary challenges in designing more valid TTS evaluations: in real-world use-cases involving interaction, listeners generally interact with a single voice, making comparative analysis unfeasible, and in complex interactions, the context and content may confound perception of voice quality.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Power Spectral Densit Equalization of Large Sp Concatenative T

This paper proposes a channel equalization algorithm for a large speech database with application in concatenative TTS systems. The convolutional channel distortion is equalized by comparing the power spectral densities (PSDs) of utterances of different recording sessions. Autoregressive linear filters are designed on a corpus level and are used offline to filter the corresponding sentences to ...

متن کامل

Synthesizing and evaluating an artificial language: klingon

The synthesis of an artificial language can provide some interesting extensions for the evaluation of text-to-speech (TTS) systems. For the alternative evaluation of the TTS system DRESS a new module for the artificial language Klingon has been developed. The linguistic and phonetic structure of Klingon can be modeled mainly by rules, with less exceptions. This contribution introduces the multi...

متن کامل

The Design of Czech Language Formal Listening Tests for the Evaluation of TTS Systems

This paper presents an attempt to design listening tests for the Czech synthesis speech evaluation. The design is based on standardized and widely used listening tests for English; therefore, we can benefit from the advantages provided by standards. Bearing the Czech language phenomena in mind, we filled the standard frameworks of several listening tests, especially the MRT (Modified Rhyme Test...

متن کامل

Robust Methodology for TTS Enhancement Evaluation

The paper points to problematic and usually neglected aspects of using listening tests for TTS evaluation. It shows that simple random selection of phrases to be listened to may not cover those cases which are relevant to the evaluated TTS system. Also, it shows that a reliable phrase set cannot be chosen without a deeper knowledge of the distribution of differences in synthetic speech, which a...

متن کامل

Prosodic Phrases and Semantic Accents in Speech Corpus for Czech TTS Synthesis

We describe a statistical method for assignment of prosodic phrases and semantic accents in read speech data. The method is based on statistical evaluation of listening test data by a maximum-likelihood approach with parameters estimated by an EM algorithm. We also present linguistically relevant quantitative results about the prosodic phrase and semantic accent distribution in 250 Czech

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017